Deep Coalescence Reconciliation with Unrooted Gene Trees: Linear Time Algorithms

نویسندگان

  • Pawel Górecki
  • Oliver Eulenstein
چکیده

Gene tree reconciliation problems invoke the minimum number of evolutionary events that reconcile gene evolution within the context of a species tree. Here we focus on the deep coalescence (DC) problem, that is, given an unrooted gene tree and a rooted species tree, find a rooting of the gene tree that minimizes the number of DC events, or DC cost, when reconciling the gene tree with the species tree. We describe an O(n) time and space algorithm for the DC problem, where n is the size of the input trees, which improves on the time complexity of the best-known solution by a factor of n. Moreover, we provide an O(n) time and space algorithm that computes the DC scores for each rooting of the given gene tree. We also describe intriguing properties of the DC cost, which can be used to identify credible rootings in gene trees. Finally, we demonstrate the performance of our new algorithms in an empirical study using data from public databases. Species trees represent the evolutionary history of species and play a key-role in a broad spectrum of applications, including comparative genomics, population divergence, and understanding patterns of diversification [3]. There has also been an increased interest in species trees to maintain biodiversity [7,10], and to study the effects of global change [9,19,21]. Species trees are traditionally inferred from the evolutionary history of gene families. Therefore, it is assumed that the history of a gene family, which can be represented as a gene tree, and the corresponding species tree are identical. While accurate species trees are crucial for their proper interpretation, for many gene families their gene trees disagree with the topology of the actual species tree [15,14,16]. Complex evolutionary events like deep coalescence, gene duplication and subsequent loss, and horizontal gene transfer can cause tremendous heterogeneity in gene trees that obscures species relationships. Reconciling such gene trees with a species tree by invoking the minimum number of evolutionary events, or reconciliation cost, is a common and well-studied approach to address these complications, and extensions of this approach are used to infer species trees [4,11,15,14,16]. While classical reconciliation problems are only applicable to rooted trees, there is a need to reconcile unrooted gene trees with rooted species trees. Most standard inference methods, like maximum parsimony or maximum likelihood, infer only unrooted gene trees from molecular sequences. When evolutionary events cause heterogeneity in gene trees it is difficult, if not impossible, to correctly root them [18,20]. In contrast, species trees often encompass a trusted root, which, for example, can be based on the NCBI taxonomy [17]. However, reconciliation problems can be naturally extended to reconcile an unrooted gene tree with a rooted species tree, by seeking a rooting of the gene tree that invokes the minimum reconciliation cost when reconciled with the species tree [12,22]. In addition, such reconciliation problems provide a direct mechanism to root gene trees based on their reconciliation score. Here we focus on the DC problem, that is defined as follows: find the rooting of a given unrooted gene tree that minimizes the number of deep coalescence events, called DC cost, when reconciled with a given rooted species tree. This problem can be solved in quadratic time [22]. When faced with the need to reconcile large collections of trees with thousands of genes, however, quadratic runtime becomes prohibitive. For example, the gene tree parsimony (GTP) problem for deep coalescence [23] seeks for a given collection of gene trees a species tree with the overall minimum DC cost. Local search heuristics aim to solve this intrinsically difficult problem by solving the DC problem for thousands to hundreds of thousands instances that reconcile gene trees with typically large species trees [1,5].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Algorithms for Unrooted Gene Trees with Polytomies

Gene tree reconciliation is a method to reconcile gene trees that are confounded by complex histories of gene duplications with a provided species tree. The trees involved are required to be rooted and full binary. Reconciling gene trees allows not only to identify and study such histories for gene families, but is also the base for several higher level applications including the estimation of ...

متن کامل

From gene trees to species trees II: Species tree inference in the deep coalescence model

When gene copies are sampled from various species, the resulting gene tree might disagree with the containing species tree. The primary causes of gene tree and species tree discord include lineage sorting, horizontal gene transfer, and gene duplication and loss. Each of these events yields a different parsimony criterion for inferring the (containing) species tree from gene trees. With lineage ...

متن کامل

A Reconciliation with Non-binary Gene Trees Revisited

By reconciling the phylogenetic tree of a gene family with the corresponding species tree, it is possible to infer lineage-specific duplications and losses with high confidence and hence to annotate orthologs and paralogs. The currently available reconciliation methods for non-binary gene trees are computationally expensive for genome-scale applications. We present four O(|G| + |S|) algorithms ...

متن کامل

Algorithms for MDC-Based Multi-Locus Phylogeny Inference: Beyond Rooted Binary Gene Trees on Single Alleles

One of the criteria for inferring a species tree from a collection of gene trees, when gene tree incongruence is assumed to be due to incomplete lineage sorting (ILS), is Minimize Deep Coalescence (MDC). Exact algorithms for inferring the species tree from rooted, binary trees under MDC were recently introduced. Nevertheless, in phylogenetic analyses of biological data sets, estimated gene tree...

متن کامل

Algorithms for MDC-Based Multi-locus Phylogeny Inference

One of the criteria for inferring a species tree from a collection of gene trees, when gene tree incongruence is assumed to be due to incomplete lineage sorting (ILS), is minimize deep coalescence, or MDC. Exact algorithms for inferring the species tree from rooted, binary trees under MDC were recently introduced. Nevertheless, in phylogenetic analyses of biological data sets, estimated gene tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012